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Fast multipole methods for 
three-dimensional N-body problems 

By P. Koumoutsakos 
1. Motivation and objectives 

We are developing computational tools for the simulations of three-dimensional 
flows past bodies undergoing arbitrary motions. High resolution viscous vortex 
methods have been developed that allow for extended simulations of two-dimensional 
configurations such as vortex generators. Our objective is to extend this methodol- 
ogy to three dimensions and develop a robust computational scheme for the simu- 
lation of such flows. 

A fundamental issue in the use of vortex methods is the ability of employing 
efficiently large numbers of computational elements to resolve the large range of 
scales that exist in complex flows. The traditional cost of the method scales as 
C?(N^) as the N computational elements/particles induce velocities at each other, 
making the method unacceptable for simulations involving more than a few tens of 
thousands of particles. In the last decade fast methods have been developed that 
have operation counts of O(NlogN) (Barnes and Hut, 1986) or C?(N) (Greengard and 
Rohklin, 1987) (referred to as BH and GR respectively) depending on the details 
of the algorithm. These methods are based on the observation that the effect of 
a cluster of particles at a certain distance may be approximated by a finite series 
expansion. In order to exploit this observation we need to decompose the element 
population spatially into clusters of particles and build a hierarchy of clusters (a 
tree data structure) - smaller neighboring clusters combine to form a cluster of the 
next size up in the hierarchy and so on. This hierarchy of clusters allows one to 
determine efficiently when the approximation is valid. This algorithm is an N-body 
solver that appears in many fields of engineering and science. Some examples of 
its diverse use are in astrophysics (Salmon and Warren 1992), molecular dynamics 
(Ding, Karasawa and Goddard 1992), micromagnetics (Yuan and Bertram 1992), 
boundary element simulations of electromagnetic problems (Kuster 1993), Nabors, 
Kim and White 1992), computer animation (Kuhn and Muller 1993), etc. More 
recently these N-body solvers have been implemented and applied in simulations 
involving vortex methods. Koumoutsakos and Leonard (1995) implemented the GR 
scheme in two dimensions for vector computer architectures allowing for simulations 
of bluff body flows using millions of particles. Winckelmans et al. (1995) presented 
three-dimensional, viscous simulations of interacting vortex rings, using vortons 
and an implementation of a BH scheme for parallel computer architectures. Bhatt 
ei al. (1995) presented a vortex filament method to perform inviscid vortex ring 
interactions, with an alternative implementation of a BH scheme for a Connection 
Machine parallel computer architecture. 

Historically the method of BH was first implemented for large scale astrophysi- 
cal simulations. Several N-body algorithms are extensions of tree codes originally 
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developed for gravitational interactions. This has been motivated not only by the 
ease of implementation to a variety of applications, but also by the fact that in 
three dimensions the scheme of GR suffers from the large computational cost of 
C?(p'*) associated with the translation of a p-th order multipole expansion. This 
cost has prohibited the use of laxge numbers of terms in the multipole expansions 
and lead to the general adoption of the BH method for N-body solvers. In a related 
effort to simplify the multipole expansion schemes, Anderson (1992) presented a 
computational scheme based on the Poisson Integral formula (hereafter referred to 
as PI ). Greengard (1988) presented a strategy to reformulate the translation of 
the expansions as a convolution operators, thus enabling the use of FFT’s and the 
reduction of the computational cost to C?(p^) operations. This strategy has been 
concisely summarized and extended in the work of Epton and Dembart (1995). 
Having overcome the (3(p^) difficulty, we believe that it is beneficial to follow the 
GR strategy as by using large number of expansions we avoid the costly pairwise 
interactions. The pairwise interactions determine the cost of the algorithm, and we 
try to minimize their number by using large numbers of expansions. Note that a 
large number of pairwise interactions may lead to algorithms of say 0(N^ ^) that 
would be inefficient for simulations involving hundreds of millions of particles (Bhatt 
et a/., 1995). The efficiency of the approach of using large numbers of terms in the 
expansions has been shown by our implementation of the method in two dimensions 
(Koumoutsakos 1996). 

The objective of this report is to present a summary of the GR multipole expan- 
sion scheme with efficient O(p^) multipole translations for general N-body problems. 
We discuss and compare the efficiency of computing the expansions based on the GR 
and the PI formulations. We document also the implementation of a suitable tree 
data structure for vector computer architectures by extending our previous work on 
the two-dimensional algorithm to strategies for a three-dimensional algorithm. 

2. Accomplishments 

We present a summary of the multipole expansions technique as presented by 
Greengard and Rohklin (1987) and Anderson (1992). We conduct some prelimi- 
nary computations to assess the accuracy and efficiency of the two techniques and 
describe the fa.st multipole translation theory following Epton and Dembart (1995). 
Finally we describe our tree data structure and its implementation so as to take 
advantage of vector computer architectures. 

2.1 The Greengard- Rohklin formulation 

In order to introduce the subject of multipole expansions, we consider a unit 
source at a point Q(x^) (Fig. 1). This unit source induces a potential at point P(x) 
given by: 


^(P;Q) = (1) 

|x - x'l 

where the spherical coordinates of x and x' are given by (r, ^, ^) and (/9, a,;S) re- 
spectively. The distance between the two points is denoted by R and the angle 
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Figure 1. Coordinate definition for multipole expansion. 


between vectors x and x' is denoted 7. If we define // = p/r and u = COS7 then the 
potential at point Q may be expressed as: 


^(P;Q) 


i - 1 

R ~ r 0 - 2u/i + pfl 


( 2 ) 


For ^ = p/r < 1, we use the generating formula for Legendre polynomials Pn so 
that the potential is expressed as: 


n 

= E (3) 

n = 0 

This equation describes the far field potential at a point P, due to a charge of 
unit strength centered at Q. To obtain a computationally tractable formulation we 
proceed to express the Legendre functions in terms of spherical harmonics: 

n 

P„(cos7)= E ¥-"•(«, (4) 


and the spherical harmonics in terms of the associated Legendre polynomials: 

The following numerically stable formulas are used for calculations: 

(n - m)P|;'(u) = (2n - 1) u Pj^-iH - (n + m - 1) P™_2(u) (6) 


and 


P-(u) = (-in2m-l)!(l-u2)^ 


( 7 ) 
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and 

P|;;(u) = (-in2m-l)!(l-u2)“ (7) 

Summarizing then we see that the far field representation of the potential induced 
by a collection of sources centered around Q with coordinates I3i) is 

expressed as: 

*(P;{qi)) = E E (8) 

n=0 m=— n 


where: 


m: 


Nv 


Y, qi Y-“ (ai,A) 


( 9 ) 


Note that if we wish to form a local expansion of the field firound the origin then 
we express 1/R as: 


1 




( 10 ) 


2. 2 Translation of multipole expansions 

Once the multipole expansions due to a collection of sources have been computed, 
one is usually interested in computing the far-held coefficients of the same collection 
expanded about some other point, say S, so that the potential would be represented 
as: 

^ « Tm 

^(S;P) = E E (11) 

n=0 m=— n 


where (<7, 0,$) are the spherical coordinates of the distance between points P and 
S. 

This defines the translation problem for multipole expansions for fast multipole 
methods. Following Greengard (1988) and Epton and Dembart (1994) we present a 
concise summ 2 try of the theory underlying the translation of multipole expansions. 
We make use of the following definitions of harmonic outer functions and inner 
functions 


07i^) = 

/-(x) = = i-l-"lA™r" 

More specifically for |x| > |x'| we obtain: 


j*n+l 

Y™(0,<^) 


oo n' 

- X') = E E (-1)"' 

n'= 0 m' = — n' 


( 12 ) 

( 13 ) 


( 14 ) 
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Figure 2. Sketch for the translation of the multipole expansions, 
and for the inner expansions we get that: 

oo 

C(f - X') = Y. Y. (-1)"' C-n“'(x) (15) 

n'= 0 m' =— n' 

SO now we may express the equation for the potential induced at point x from a 
unit source at point x' as: 


1 


X — 



In order to further exhibit the formulation of these translation operators we consider 
the configuration shown in Fig. 2. We wish to determine the potential induced by a 
collection of sources within a sphere centered at xq and having radius Rq (denoted 
as s(x“o,i?o) to a collection of points/sources within a sphere s(x 3 ,i? 3 ). This is 
achieved in the following steps: 

(i)We compute a set of multipole expansion coefficients (using Eq. 8) for the 
far-field representation of a set of sources distributed within s(fo, iZo) Then the far 
field representation of the field induced by this cluster of particles at a location x 
is given by: 

*W= E E (16) 

n 3= 0 m = — n 


(ii) We translate the Outer expansion about xq to an Outer expansion about x\ 
( child to parent ): 

1 = Oj = -I 


where 


i>; = E E c” 

n = 0 m = — n 


( 18 ) 
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(iii) We translate the expeinsion about x\ to an Inner expeinsion about x *2 ( 

box - box interaction ): 


oo 1 


V’(x)= 5] XI Ej lj(x-X2) 

1 = Oj = -I 


(19) 


where 

E d: oi-"(x5-x-.) (20) 

n = 0 m = — n 

(iv) We translate the Inner expansion about x *2 to an Inner expansion about ( 
parent to child ): 

oo 1 

V’(x)=X (21) 

1 = Oj = -1 

where 

•f- = E E E? C-7(«->5) (22) 

n = 0 m =: — n 

(v) Once the coefficients of the multipole expansions have been computed in the 
sphere s{x*zyRz) we perform a local expansion using Eq. 10 to compute the potential 
at the individual points. 

The above representations for the translation operations of spherical harmonics 
reveal that they require the evaluation of double summations that are essentially 
convolution operations over the coefficients of the expansions and can essentially 
be computed using 2-D FFT’s. 


2.3 The Poisson integral method 

In order to approximate the potential due to a collection of particles Anderson 
(1992) proposed an alternative simplified technique. This technique is based on the 
observation that a harmonic function {^) external to a sphere of radius R may be 
described using its boundary values g{R^ on the surface of the sphere. So given a 
point X and a point on the unit sphere that points in the direction of x then: 


1 ^ P 

/ E + 1)(7)“*’P.(5 ■ x-p) 

g 2 Ln - 0 


g(Rs) ds 


(23) 


where denotes the surface of the unit sphere and Pn is the n-th order Legendre 
polynomial. We use a quadrature formula to integrate the function on the surface of 
the sphere with K integration points and weights W{ to obtain an approximation 
of the form: 


1 r ^ R 

*W“i^E E(2”+iX7)"'"‘p-(«'Xp) 


i=l Ln = 0 


g(Rsi) wi 


(24) 
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Figure 3. Sketch for the translation of the multipole expansions using the PI 
technique. 


In order to compute the far field multipole expansion of a cluster of particles based 
on this formulation, the function g{R^ is determined on certain quadrature points 
on the surface of a sphere using the direct interactions of the potential induced by 
the sources onto the evaluation points. Subsequently these coefficients are used in 
Eq. 24 to compute the potential induced at distances sufficiently large compared 
to the radius of the cluster. In order to translate the expansions the above formula 
may be used again by considering the coefficients of g{Rsi) on the inner sphere to be 
sources themselves. The method is completed by observing that a local expansion 
approximation may be constructed using the following formula. Note that the 
expansions are in terms of r/R in this formula. 


^(x) 


1 K r 


i=l Ln = 0 


g(RsI) Wi 


(25) 


Anderson (1992) showed that approximations with M = 2p + 1 terms may be 
compared with multipole schemes that have p terms retained in the expansions. 
The strength of this method relies on its simplicity and its easy extension from two 
to three dimensions. This is exhibited by considering the implementation of this 
technique in the context of an O (JV) algorithm for the computation of the potential 
field induced by a set of particles within a sphere s(xo, i?o) to a cluster of points in 
(Fig. 4). This interaction is performed in the following steps: 

(i) The potential induced by the particles (denoted by dots) is computed on 
quadrature points properly selected on the surface of the enclosing sphere (denoted 
by + ), thus constructing the function ^(iio'^i) 

(ii) The potential induced by the quadrature points on s{xo,Rq) is computed on 
the quadrature points of s(fi,i?i) using Eq. 24. 

(hi) The coefficients g{R\Si) are considered to be sources themselves so that the 
coefficients g{R 2 ^i) are computed using Eq. 24. 

(iv) The coefficients g{RzSi) are computed subsequently by performing a local 
expansion from the quadrature points on sphere s(x 2 ,i? 2 ) to the quadrature points 
on the sphere s{x 2 iR 2 ) using Eq. 25. 
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Figure 4, Relative error of the GR and PI multipole expansion schemes. Symbols: 

(GR): : P = 8; (PI); : m = 14, : m = 9, m = 7, : 

m = 5. 

(v) By using Eq. 25 the potential on the particles inside the sphere ^(f 3 ,ii 3 ) is 
computed using the coefficients 

The simplicity of the formulas implemented in this technique make it an attrac- 
tive alternative to the multipole expansion method of GR. We consider below a 
comparison of the two methods in terms of their accuracy and computational cost. 

2,4 Computational cost 

The computational cost, associated with the multipole expansions of the GR 
scheme, scales as 0{p^ N) for the multipole- particle operations and as C?(logN p^) 
for the multipole translation operations using the convolution formulation discussed 
above. For the Poisson Integral formulation, assuming K integration points and M 
terms in the Poisson kernel, the cost scales as 0{K x M x N) for the particle- 
multipole interactions and as C?(K x K x M) for the multipole translation. 

Both algorithms have error terms that scale as where L corresponds 

to p terms in the multipole method and to M + 1 for the Poisson Integral scheme. 
The number of the required integration points (K) corresponding to 5th, 7th, 9th, 
and 14th order quadrature formulas require 12, 24, 32, and 72 points respectively 
or approximately A ^ 2m^/5 points for an m-th order integration formula. So 
in order to achieve the seune accuracy with the PI method as with the multipole 
scheme we require that m Af « P. This then implies a computational cost of 
C?(p^N) for the particle- multipole interactions while it implies a C?(logNp®) cost 
eissociated with the multipole translations. 

Although such estimates depend on the particular implementation of the method, 
it is evident that for the same order of accuracy the multipole method with fast 
translation operations would lead to much faster computations, especially for large 
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expansion orders. Moreover, in the particular implementation of the methods it was 
easier to unroll the computational loops involving the particle multipole interactions 
for the GR scheme than the respective operations in the PI scheme. In Fig. 4 we 
present also the relative error as computed by the two methods using P = 8 and 
different values of m in the Poisson Integral method. 

In the following table we present some representative timing results for the con- 
struction of the multipole expansions as well as for the particle-multipole interac- 
tions. A number of N particles was distributed randomly inside a cube of side 2 and 
the potential was evaluated on L points uniformly distributed along a line extending 
from the center of the cube and along one of its sides. The first line corresponds to 
the CPU time required for the construction of the multipol e expansions while the 
second line corresponds to the CPU time required to perform the particle-multipole 
expansions. It is observed then that using the GR multipole expansion scheme 
an order of magnitude faster calculations are achieved for the same order of accu- 
racy. This dictates the use of the GR technique for computations using multipole 
expansions. 

2.5 Tree data structures 

A key issue in the implementation of fast multipole methods is the associated 
data structure and the computer platform. The present methodology has been suc- 
cessfully implemented for vector computer architectures and several of its feat ures 
could be carried over to parallel platforms involving large numbers of vector proces- 
sors. The scheme of GR relies in a predetermined tree data structure and a large 
number of terms to be kept in the expansions while the scheme of BH determines 
the interaction list while traversing the tree data structure, and a tradeoff is made 
between the number of terms kept in the expansions (usually two to four) and the 
distance at which the expansions are favored over direct interactions. 

In order to exploit the observation that the effect of a cluster of particles at a 
certain distance may be approximated by a finite sum of series expansions using 
the equations described above, we need to orgzmize the particles in a hierarchy of 
clusters. This hierarchy of clusters allows one to efficiently determine when the 
approximation is valid. In order to establish the particle clusters one may resort to 
a tree building algorithm. 

The straightforward method of computing the pairwise interaction of all particles 
requires O(N^) operations for N vortex elements. In the last decade fa^t methods 
have been developed that have operation counts of C?(A^logA^) (Barnes and Hut, 
1986) or 0{N) (Greengard and Rohklin, 1987) depending on the details of the 
algorithm. The basic idea of these methods is to decompose the element population 
spatially into clusters of particles and build a hierarchy of clusters or a tree - smaller 
neighboring clusters combine to form a cluster of the next size up in the hierarchy 
and so on. 

The contribution of a cluster of particles to the potential of a given vortex can 
then be computed to desired accuracy if the particle is sufficiently far from the 
cluster in proportion to the size of the cluster and a sufficiently large number of 
terms in the multipole expansion is taken. This is the essence of the ^particle- box’ 
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N 

L 

m = 5 

m = 7 

m = 9 

m = 14 

Miltipoles 
P =8 

5 

10 

10^ 

0.0708 

0.1397 

0,1863 

0.4191 

0.1023 

0.5268 

0.5844 

0.6395 

1.0870 

0.0959 

6 

10 

6 

10 

0.7111 

1.4219 

1.8968 

4.2680 

1.0251 

5.2724 

5.8516 

6.4145 

10.8925 

0.9585 


Figure 5. Computational cost of the FMM ajid Poisson integral method. 

method, requiring 0{NlogN) operations. One then tries to minimize the work 
required by maximizing the size of the cluster used while keeping the number of 
terms in the expansion within a reasonable limit and maintaining a certain degree 
of accuracy. 

The ‘box-box’ scheme goes one step further as it accounts for box-box interactions 
as well. These interactions are in the form of shifting the expansions of a certain 
cluster with the desired accuracy to the center of another cluster. Then those 
expansions are used to determine the velocities of the particles in the second cluster. 
This has as an effect the minimization of the tree traversals for the individual 
particles requiring only 0{N) operations. 

In order to construct the tree data structure, the three-dimensional space is con- 
sidered to be a cube enclosing all computational elements. We apply the operation 
of continuously subdividing a cube into eight identical cubes until each cube has 
only a certain maximum number of particles in it or the maximum allowable level 
of subdivisions has been reached. 

The hierarchy of boxes defines a tree data structure which is common for both 
algorithms. The tree construction proceeds level by level starting at the finest level 
of the paxticles and proceeding upwards to coarser box levels. Due to the simplicity 
of the geometry of the computational domain, the addressing of the elements of the 
data structure is facilitated significantly. As the construction proceeds pointers are 
assigned to the boxes so that there is direct addressing of the first and last particle 
index in them as well as direct access to their children and parents. This facilitates 
the computation of the expansion coefficients of the children from the expansions 
of the parents for the BB eilgorithm and the expansions of the parents from those 
of the children for the PB 

The data structure is used to determine when the expansions are to be used and 
when pairwise interactions have to be calculated. It helps in communicating to the 
computer the geometric distribution of the particles in the computational domeiin. 
The peirticles reside at the finest level of the structure. Clusters of particles form 
the interior nodes of the tree and hierarchical relations are established. The data 
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structure adds to the otherwise minimal memory requirements of the vortex method. 

The tree has to be reconstructed at every step as the particles change positions 
in the domain. There are several ways that nearby particles could be clustered 
together and some of the decisions to be made axe: 

The center of expansions. In the present study the geometric center of the cells 
is used as it facilitates the addressing of the data structure. 

The cluster size. In the present algorithms we follow a hybrid strategy as we keep 
at least Lmin particles per box until we reach a predetermined finest level of boxes. 
The number Lmin may be chosen by the user depending on the particle population 
and configuration so as to achieve an optimal computational cost. 

Addressing the clusters. As particles are usually associated with a certain box, it 
is efficient to sort the particle locations in the memory so that particles that belong 
to the same box occupy adjacent locations in the memory devoted to the particle 
arrays. Such memory allocation enhances the vectorization significantly as very 
often we loop over particles of the same box (e.g., to construct the expansions at 
the finest level, or to compute interactions) and the loops have an optimum stride 
of one. 

Description of the Algorithms: 

In both algorithms, described herein, we may distinguish three stages: 

• Building the data structure (tree). 

• Establishing the interaction lists (by non- recursively descending the tree). 

• Pairwise interactions for all particles in the domain. 

The building of the data structure is common for both algorithms, but they differ 
in the tree descent and the pairwise interactions. Care has been exercised at all 
stages to maximize vectorization. In our respective two-dimensional implementa- 
tion, the building of the data structure consumes about 5 — 7% of the time, the 
descent consumes another 5 - 10% so that the largest amount is spent in computing 
the pairwise interactions. 

2.6 The particle-box algorithm 

Step 1. Building the data structure (tree) 

Step la : For each of the cubes at each level that are not further subdivided, 
we compute the p-terms of the multipole expansions. These expansions are used 
to describe the influence of the particles at locations that are well separated from 
their cluster. 

Step lb : The expansions of all parent boxes are constructed by shifting the ex- 
pansion coefficients of their children. The tree is traversed upwards in this stage. 
Rather than constructing the expansions of all the members of a family (that is 
traverse each branch until the root is reached) we construct the expansions of all 
parent boxes at each level simultaneously. This enables long loops over the parent 
boxes at each level. Care is taken so that the procedure is fully vectorized by taking 
advantage of the regularity of the data structure and the addressing of the boxes in 
the memory. Moreover, the regularity of the data structure allows us to precompute 
many coefficients that are necessary for the expansions. Straightforward implemen- 
tation of these translations leads to computational cost of 0{p^). This has been 
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the major reason that most implementations of the algorithm have employed only 
up to p=3 terms in the multipole expansions. However, such an approach results 
in large numbers of parti cle-particle interactions and hence a large computational 
cost. We implement the technique proposed by Greengard (1988) (see above) to 
reduce the computational cost of this translation to O(p^) operations, by observing 
that this translation amounts to a convolution, and employ FFT’s. 

Step 2. Establishing of interaction lists 

In the present algorithm a breadth-first search is performed at each level to 
establish the interaction lists of each particle (cell). This search is facilitated by 
the regularity of the data structure and the identification arrays of the cells in the 
tree. At each level interaction lists are established for the particles (cells) by looping 
across the cells of a certain level. 

Note that this depth-first search for interaction lists is further facilitated by the 
fact that every particle belongs to a childless box. It is easy then to observe that 
all particles in the same box share the same interaction list, which is comprised of 
members of the tree that belong to coarser levels. In this way the tree is traversed 
upwards for all particles in a childless box together and downwards separately for 
each particle. It is evident that this procedure is more efficient for uniformly clus- 
tered configurations of particles because there would be more particles that belong 
to childless boxes at the finest level. 

Step 3. Computation of the interactions. 

Once the interaction lists have been established, the velocities of the particles 
axe computed by looping over the elements of the lists. For particles that have 
the same boxes in their interaction list, this is performed simultaneously so that 
memory referencing is minimized. Moreover by systematically traversing the tree, 
the particle-particle interactions are made symmetric so that the cost of this com- 
putation is halved. The cost of this step is O(ATp^). 

2.7 The box-box algorithm. 

This scheme is similar to the PB scheme except that here every node of the 
tree assumes the role of a particle. In other words, interactions are not limited to 
particle-particle and particle-box but interactions between boxes are considered as 
well. Those interactions are in the form of shifting the expansion coefficients of 
one box into another 2 uid the interaction lists are established with respect to the 
locations of every node of the tree. 

The scheme distinguishes five categories of interacting elements of the tree with 
respect to a cell denoted by c. 

• List 1: All childless boxes neighboring c. 

• List 2: Children of colleagues (boxes of the same size) of c’s parents that are 
well separated from c. All such boxes belong to the same level with c. 

• List 3: Descendants (not only children) of c’s colleagues , whose parents are 
adjacent to c but axe not adjacent to c themselves. All such boxes belong to finer 
levels. 

• List 4: All boxes such that box c belongs to their List 3. All such boxes are 
childless and belong to coarser levels. 
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•List 5: All boxes well separated from c’s parents. Boxes in this category do 
not interact directly with the cell c. 

If the cell c is childless it may have interacting pairs that belong to all four lists. 
However if it is a parent it is associated with boxes that belong to lists 2 and 4 as 
described above. These observations are directly applied in our algorithm and we 
may distinguish again the following 4 steps. 

Step 1: Building the data structure* 

This procedure is the same as for the PB scheme. This fact enables us to compare 
directly the two algorithms and assess their efficiency. 

Step 2: Construction of interaction lists. 

To establish the interaction lists we proceed again level by level, starting at 
the coarsest level. For each level we distinguish childless and parent boxes. In 
establishing lists 1 and 3 we need only loop over childless boxes where 2 is to establish 
lists 2 and 4 we loop over all cells that are active in a certain level. 

Step 2a: Here we establish lists 1 and 3. We start at the level of the parents of 
box c and we proceed level by level examining again breadth first, until we reach 
the finest level of the structure (the particles). The elements of lists 1 are basically 
the particles and account for the particle-particle interactions. Care is exercised so 
that this computation is symmetric, and we need to traverse the tree downwards 
only thus minimizing the search cost. The elements of List 3 axe the boxes and are 
accountable for the particle-box interactions in this scheme. 

Step 2b: Here we establish interaction lists 2 and 4 for all boxes in the hierarchy. 
We start at the coarsest possible level and proceed downward until reaching the 
level of box c to establish the interaction lists. To do so for a certain box we start 
by examining boxes that are not well separated from their parents (otherwise they 
would have been dealt with at the coarser level). Subsequently the children of those 
boxes are examined to establish interaction lists. 

Step 3: Computations of the interactions 

In this scheme we consider three kinds of interactions: the box- box, particle- box, 
and particle-particle interactions. The latter two categories were discussed in the 
previous section. For the box-box interactions once the respective interaction lists 
have been established (with members of lists 2 and 4), we need to transfer those 
expansions down to the ones of the children and add them to the existing ones. This 
procedure is vectorized by looping over the number of boxes at each level. The use 
of pointers to access the children of each box enhances this vectorization. Note that 
an arbitrarily high number of expansions can be calculated efficiently by unrolling 
the loop over the number of expansions into the previously mentioned loop. 

3. Conclusions and future work 

We are in the process of developing a three-dimensional N-body problem solver 
with the objective of applying it to the solution of engineering problems involv- 
ing boundary element methods. This solver would be mainly used for the imple- 
mentation of vortex methods for three-dimensional simulations involving unsteady 
flows past complex moving configurations. Furthermore, of particular interest is the 
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application of the code to simulations of reurefied flows using molecular dynamics 

methods. The code is envisioned as a computationcJ tool that would easily enable 

the treinsition between continuum flows aad flows using molecular dynamics. 
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